{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 🎯 CSPOT Phenotyping\n",
    "Assign phenotypes to each cell. Clustering data may not always be ideal, so we developed a cell type assignment algorithm that does a hierarchical assignment process iteratively.\n",
    "  \n",
    "Please keep in mind that the sample data is used for demonstration purposes only and has been simplified and reduced in size. It is solely intended for educational purposes on how to execute `cspot` and will not yeild any meaningful results."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Download [executable notebook here](https://github.com/nirmallab/cspot/blob/main/docs/Tutorials/notebooks/PhenotypeCells.ipynb).\n",
    "  \n",
    "Make sure you have completed `Build cepot Model` and `Run cspot Algorithm` Tutorial before you try to execute this Jupyter Notebook!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "# import packages\n",
    "import cspot as cs\n",
    "import pandas as pd"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**We need `two` basic inputs to perform phenotyping with CSPOT**\n",
    "- The cspot Object\n",
    "- A Phenotyping workflow based on prior knowledge"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "# set the Project directory\n",
    "projectDir = '/Users/aj/Documents/cspotExampleData'\n",
    "# Path to the CSPOT Object\n",
    "csObject = projectDir + '/CSPOT/csObject/exampleImage_cspotPredict.ome.h5ad'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<style type=\"text/css\">\n",
       "</style>\n",
       "<table id=\"T_fb8d4\">\n",
       "  <thead>\n",
       "    <tr>\n",
       "      <th class=\"blank level0\" >&nbsp;</th>\n",
       "      <th id=\"T_fb8d4_level0_col0\" class=\"col_heading level0 col0\" >Unnamed: 0</th>\n",
       "      <th id=\"T_fb8d4_level0_col1\" class=\"col_heading level0 col1\" >Unnamed: 1</th>\n",
       "      <th id=\"T_fb8d4_level0_col2\" class=\"col_heading level0 col2\" >ECAD</th>\n",
       "      <th id=\"T_fb8d4_level0_col3\" class=\"col_heading level0 col3\" >CD45</th>\n",
       "      <th id=\"T_fb8d4_level0_col4\" class=\"col_heading level0 col4\" >CD4</th>\n",
       "      <th id=\"T_fb8d4_level0_col5\" class=\"col_heading level0 col5\" >CD3D</th>\n",
       "      <th id=\"T_fb8d4_level0_col6\" class=\"col_heading level0 col6\" >CD8A</th>\n",
       "      <th id=\"T_fb8d4_level0_col7\" class=\"col_heading level0 col7\" >KI67</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th id=\"T_fb8d4_level0_row0\" class=\"row_heading level0 row0\" >0</th>\n",
       "      <td id=\"T_fb8d4_row0_col0\" class=\"data row0 col0\" >all</td>\n",
       "      <td id=\"T_fb8d4_row0_col1\" class=\"data row0 col1\" >Immune</td>\n",
       "      <td id=\"T_fb8d4_row0_col2\" class=\"data row0 col2\" ></td>\n",
       "      <td id=\"T_fb8d4_row0_col3\" class=\"data row0 col3\" >anypos</td>\n",
       "      <td id=\"T_fb8d4_row0_col4\" class=\"data row0 col4\" >anypos</td>\n",
       "      <td id=\"T_fb8d4_row0_col5\" class=\"data row0 col5\" >anypos</td>\n",
       "      <td id=\"T_fb8d4_row0_col6\" class=\"data row0 col6\" >anypos</td>\n",
       "      <td id=\"T_fb8d4_row0_col7\" class=\"data row0 col7\" ></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th id=\"T_fb8d4_level0_row1\" class=\"row_heading level0 row1\" >1</th>\n",
       "      <td id=\"T_fb8d4_row1_col0\" class=\"data row1 col0\" >all</td>\n",
       "      <td id=\"T_fb8d4_row1_col1\" class=\"data row1 col1\" >ECAD+</td>\n",
       "      <td id=\"T_fb8d4_row1_col2\" class=\"data row1 col2\" >pos</td>\n",
       "      <td id=\"T_fb8d4_row1_col3\" class=\"data row1 col3\" ></td>\n",
       "      <td id=\"T_fb8d4_row1_col4\" class=\"data row1 col4\" ></td>\n",
       "      <td id=\"T_fb8d4_row1_col5\" class=\"data row1 col5\" ></td>\n",
       "      <td id=\"T_fb8d4_row1_col6\" class=\"data row1 col6\" ></td>\n",
       "      <td id=\"T_fb8d4_row1_col7\" class=\"data row1 col7\" ></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th id=\"T_fb8d4_level0_row2\" class=\"row_heading level0 row2\" >2</th>\n",
       "      <td id=\"T_fb8d4_row2_col0\" class=\"data row2 col0\" >ECAD+</td>\n",
       "      <td id=\"T_fb8d4_row2_col1\" class=\"data row2 col1\" >KI67+ ECAD+</td>\n",
       "      <td id=\"T_fb8d4_row2_col2\" class=\"data row2 col2\" ></td>\n",
       "      <td id=\"T_fb8d4_row2_col3\" class=\"data row2 col3\" ></td>\n",
       "      <td id=\"T_fb8d4_row2_col4\" class=\"data row2 col4\" ></td>\n",
       "      <td id=\"T_fb8d4_row2_col5\" class=\"data row2 col5\" ></td>\n",
       "      <td id=\"T_fb8d4_row2_col6\" class=\"data row2 col6\" ></td>\n",
       "      <td id=\"T_fb8d4_row2_col7\" class=\"data row2 col7\" >pos</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th id=\"T_fb8d4_level0_row3\" class=\"row_heading level0 row3\" >3</th>\n",
       "      <td id=\"T_fb8d4_row3_col0\" class=\"data row3 col0\" >Immune</td>\n",
       "      <td id=\"T_fb8d4_row3_col1\" class=\"data row3 col1\" >CD4+ T</td>\n",
       "      <td id=\"T_fb8d4_row3_col2\" class=\"data row3 col2\" ></td>\n",
       "      <td id=\"T_fb8d4_row3_col3\" class=\"data row3 col3\" ></td>\n",
       "      <td id=\"T_fb8d4_row3_col4\" class=\"data row3 col4\" >allpos</td>\n",
       "      <td id=\"T_fb8d4_row3_col5\" class=\"data row3 col5\" >allpos</td>\n",
       "      <td id=\"T_fb8d4_row3_col6\" class=\"data row3 col6\" ></td>\n",
       "      <td id=\"T_fb8d4_row3_col7\" class=\"data row3 col7\" ></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th id=\"T_fb8d4_level0_row4\" class=\"row_heading level0 row4\" >4</th>\n",
       "      <td id=\"T_fb8d4_row4_col0\" class=\"data row4 col0\" >Immune</td>\n",
       "      <td id=\"T_fb8d4_row4_col1\" class=\"data row4 col1\" >CD8+ T</td>\n",
       "      <td id=\"T_fb8d4_row4_col2\" class=\"data row4 col2\" ></td>\n",
       "      <td id=\"T_fb8d4_row4_col3\" class=\"data row4 col3\" ></td>\n",
       "      <td id=\"T_fb8d4_row4_col4\" class=\"data row4 col4\" ></td>\n",
       "      <td id=\"T_fb8d4_row4_col5\" class=\"data row4 col5\" >allpos</td>\n",
       "      <td id=\"T_fb8d4_row4_col6\" class=\"data row4 col6\" >allpos</td>\n",
       "      <td id=\"T_fb8d4_row4_col7\" class=\"data row4 col7\" ></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th id=\"T_fb8d4_level0_row5\" class=\"row_heading level0 row5\" >5</th>\n",
       "      <td id=\"T_fb8d4_row5_col0\" class=\"data row5 col0\" >Immune</td>\n",
       "      <td id=\"T_fb8d4_row5_col1\" class=\"data row5 col1\" >Non T CD4+ cells</td>\n",
       "      <td id=\"T_fb8d4_row5_col2\" class=\"data row5 col2\" ></td>\n",
       "      <td id=\"T_fb8d4_row5_col3\" class=\"data row5 col3\" ></td>\n",
       "      <td id=\"T_fb8d4_row5_col4\" class=\"data row5 col4\" >pos</td>\n",
       "      <td id=\"T_fb8d4_row5_col5\" class=\"data row5 col5\" >neg</td>\n",
       "      <td id=\"T_fb8d4_row5_col6\" class=\"data row5 col6\" ></td>\n",
       "      <td id=\"T_fb8d4_row5_col7\" class=\"data row5 col7\" ></td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n"
      ],
      "text/plain": [
       "<pandas.io.formats.style.Styler at 0x110bfa580>"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# load the phenotyping workflow\n",
    "phenotype = pd.read_csv(str(projectDir) + '/phenotype_workflow.csv')\n",
    "# view the table:\n",
    "phenotype.style.format(na_rep='')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As it can be seen from the table above,  \n",
    "(1) The `first column` has to contain the cell that are to be classified.  \n",
    "(2) The `second column` indicates the phenotype a particular cell will be assigned if it satifies the conditions in the row.  \n",
    "(3) `Column three` and onward represent protein markers. If the protein marker is known to be expressed for that cell type, then it is denoted by either `pos`, `allpos`. If the protein marker is known to not express for a cell type it can be denoted by `neg`, `allneg`. If the protein marker is irrelevant or uncertain to express for a cell type, then it is left empty. `anypos` and `anyneg` are options for using a set of markers and if any of the marker is positive or negative, the cell type is denoted accordingly."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**To give users maximum flexibility in identifying desired cell types, we have implemented various classification arguments as described above for strategical classification. They include**\n",
    "\n",
    "- allpos\n",
    "- allneg\n",
    "- anypos\n",
    "- anyneg\n",
    "- pos\n",
    "- neg\n",
    "  \n",
    "`pos` : \"Pos\" looks for cells positive for a given marker. If multiple markers are annotated as `pos`, all must be positive to denote the cell type. For example, a Regulatory T cell can be defined as `CD3+CD4+FOXP3+` by passing `pos` to each marker. If one or more markers don't meet the criteria (e.g. CD4-), the program will classify it as `Likely-Regulatory-T cell`, pending user confirmation. This is useful in cases of technical artifacts or when cell types (such as cancer cells) are defined by marker loss (e.g. T-cell Lymphomas).\n",
    "  \n",
    "`neg` : Same as `pos` but looks for negativity of the defined markers. \n",
    "  \n",
    "`allpos` : \"Allpos\" requires all defined markers to be positive. Unlike `pos`, it doesn't classify cells as `Likely-cellType`, but strictly annotates cells positive for all defined markers.\n",
    "  \n",
    "`allneg` : Same as `allpos` but looks for negativity of the defined markers. \n",
    "  \n",
    "`anypos` : \"Anypos\" requires only one of the defined markers to be positive. For example, to define macrophages, a cell could be designated as such if any of `CD68`, `CD163`, or `CD206` is positive.\n",
    "  \n",
    "`anyneg` : Same as `anyneg` but looks for negativity of the defined markers. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Phenotyping Immune\n",
      "Phenotyping ECAD+\n",
      "-- Subsetting ECAD+\n",
      "Phenotyping KI67+ ECAD+\n",
      "-- Subsetting Immune\n",
      "Phenotyping CD4+ T\n",
      "Phenotyping CD8+ T\n",
      "Phenotyping Non T CD4+ cells\n",
      "Consolidating the phenotypes across all groups\n",
      "Modified csObject is stored at \"/Users/aj/Documents/cspotExampleData/CSPOT/csPhenotype\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/Users/aj/miniconda3/envs/cspot/lib/python3.9/site-packages/cspot/csPhenotype.py:259: SettingWithCopyWarning: \n",
      "A value is trying to be set on a copy of a slice from a DataFrame.\n",
      "Try using .loc[row_indexer,col_indexer] = value instead\n",
      "\n",
      "See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n",
      "  allpos_score['score'] = allpos_score.max(axis=1)\n",
      "/Users/aj/miniconda3/envs/cspot/lib/python3.9/site-packages/cspot/csPhenotype.py:259: SettingWithCopyWarning: \n",
      "A value is trying to be set on a copy of a slice from a DataFrame.\n",
      "Try using .loc[row_indexer,col_indexer] = value instead\n",
      "\n",
      "See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n",
      "  allpos_score['score'] = allpos_score.max(axis=1)\n"
     ]
    }
   ],
   "source": [
    "adata = cs.csPhenotype ( csObject=csObject,\n",
    "                            phenotype=phenotype,\n",
    "                            midpoint = 0.5,\n",
    "                            label=\"phenotype\",\n",
    "                            imageid='imageid',\n",
    "                            pheno_threshold_percent=None,\n",
    "                            pheno_threshold_abs=None,\n",
    "                            fileName=None,\n",
    "                            projectDir=projectDir)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Same function if the user wants to run it via Command Line Interface**\n",
    "```\n",
    "python csPhenotype.py \\\n",
    "            --csObject /Users/aj/Documents/cspotExampleData/CSPOT/csObject/exampleImage_cspotPredict.ome.h5ad \\\n",
    "            --phenotype /Users/aj/Documents/cspotExampleData/phenotype_workflow.csv \\\n",
    "            --projectDir /Users/aj/Documents/cspotExampleData\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**If you had provided `projectDir` the modified csObject would be stored in `CSPOT/csPhenotype/`, else, the object will be returned to memory.**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "KI67+ ECAD+    6159\n",
       "CD4+ T         5785\n",
       "CD8+ T          816\n",
       "Name: phenotype, dtype: int64"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# check the identified phenotypes\n",
    "adata.obs['phenotype'].value_counts()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<hr>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Tutorial Ends here (check out some of the helper functions!)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.16"
  },
  "vscode": {
   "interpreter": {
    "hash": "4d975fac4fcc437c670ab44b5da89fd54fa784afb4bff9f75c9477844a77bbbe"
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
